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ABSTRACT 


/This  paper  reports  the  results  of  a  statistical  analysis  of  the 
performance  of  three  branch  and  bound  algorithms  for  the  general  (asym¬ 
metric)  traveling  salesman  problem  on  randomly  generated  test  problems 
with  up  to  325  cities.  Three  types  of  functions,  polynomial,  super¬ 
polynomial  (log-exponential)  and  exponential,  were  fitted  to  the  perform¬ 
ance  data  of  each  of  the  algorithms  by  least  squares  techniques.  The 
three  functions  describe  almost  equally  well  the  behavior  of  the  algo¬ 
rithms  in  the  range  of  problem  sizes  examined.  . 
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1.  Introduction 


Given  n  cities  and  a  nonnegative  cost  c^  of  traveling  from  city  i  to 
city  j,  the  traveling  salesman  problem  (TSP)  asks  for  a  minimum  cost  tour  of 
the  n  cities.  In  graph- theoretical  terms,  the  n  cities  are  the  nodes  of  a 
complete  directed  graph  G  =  (N,A)  and  an  optimal  tour  is  a  minimum  cost  di¬ 


rected  Hamilton  cycle.  The  TSP  is  called 


if  c^j  -  c^  for  all  i,J, 


asynmetric  otherwise.  By  associating  a  variable  with  every  arc,  defined 
to  be  1  if  (i,j)  is  an  arc  of  the  tour  (Hamilton  cycle)  to  be  constructed  and 
0  otherwise,  one  can  write  the  TSP  (see  [5  ])  as 


n  n 


(1.1)  min  2  2c. ,x 

1=1  j=l  1J 

subject  to 


(1.2) 


(1.3) 


(1.4) 


xij,0on, 


i  *  1, . . .  ,n 


j  *  1 . n 


i, j  “  1, . • . ,n 


£  **,,<1*1-1.  **  =  U . ■>).  2<  |s|  <n-l. 

ieS  jeS  1J 


Without  (1.4),  the  above  problem  is  the  n  x  n  assignment  problem  (AP), 
equivalent  to  the  linear  program  (1.1),  (1.2)  and 

(1.3')  >  0,  l.j  -  1 . . 

A  solution  to  AP  is  either  a  tour  or  a  collection  of  subtours  (direc¬ 
ted  cycles  of  length  less  than  n),  and  the  role  of  the  inequalities  (1.4) 
(called  subtour  elimination  constraints)  is  precisely  to  exclude  the  occurrence 
of  sub tours  in  a  solution.  On  the  other  hand,  the  presence  of  (1.4)  also  re¬ 
quires  that  of  (1.3),  since  the  polyhedron  defined  by  (1.2),  (1.3 7)  and  (1.4) 
has  fractional  vertices.  Thus  the  TSP  is  an  integer  program. 

Although  it  has  a  very  special  structure,  the  TSP  still  has  the  reputa¬ 
tion  of  a  notoriously  difficult  combinatorial  problem.  It  is  known  to  be 
NP-complete,  and  hence  in  all  likelihood  there  exists  no  polynomial  time  al¬ 
gorithm  for  solving  it  -  in  the  worst  case  sense,  i.e.,  in  the  sense  of  being 
guaranteed  to  solve  every  instance  of  the  TSP.  This,  however,  leaves  open 
the  question  concerning  the  expected  or  average  time  required  to  solve  a  ran¬ 
domly  chosen  TSP.  Whether  the  expected  running  time  of  various  TSP  algorithms 
is  an  exponential  or  a  polynomial  function  of  n,  or  perhaps  something  in  be¬ 
tween  like  a  superpolynomial  or  log-exponential  function,  is  at  present  an 
open  question.  A  probabilistic  analysis  of  the  problem,  although  very  tempting 

seems  difficult.  We  say  it  is  tempting,  since  the  AP,  one  of  the  most  conmon 

3  2 

relaxations  of  the  TSP,  is  solvable  in  0(n  )  time  in  general,  and  in  0(n  ) 

time  when  processed  repeatedly  in  the  context  of  a  branch  and  bound  procedure 

for  the  TSP.  At  the  same  time,  the  AP  has  n!  solutions  (assignments),  of 

which  (n-1)!  are  TSP  solutions  (tours).  Furthermore,  in  the  context  of  the 

TSP  only  those  assignments  are  of  interest  that  contain  no  diagonal  elements 


0,  i  =  l,...,n),  and  while 


(of  the  assignment  tableau,  i.e.,  satisfy  x^  = 
the  assignment  problem  with  this  extra  constraint  is  as  easy  to  solve  as  the 
original  one,  the  number  of  its  solutions  is  n!/e  rounded  to  the  nearest  integer, 
where  e  is  the  base  of  the  natural  logarithm  (see,  for  instance,  [6],  p.  10). 
Thus  on  the  average  one  in  every  n/e  "diagonal- free"  assignments  is  a  tour. 

This,  together  with  the  fact  that  a  breadth-first  branch  and  bound  procedure 
can  rank  the  k  best  diagonal-free  assignments  by  solving  at  most  kn  assignment 

3 

problems,  hence  by  a  computational  effort  of  0(kn  ),  suggests  that  a  procedure 
of  this  type  might  be  able  to  find  an  optimal  tour  in  a  TSP  with  randomly  gen- 

4 

erated  costs  by  an  average  computational  effort  of  0(n  ).  For  this  to  be  true, 
however,  the  probability  distribution  of  the  costs  would  have  to  be  such  that, 
roughly  speaking,  if  all  assignments  are  listed  in  order  of  increasing  costs, 
tours  are  evenly  distributed  throughout  the  list  (rather  than  "clustered"  in 
certain  parts  of  it).  At  this  point  it  is  not  clear  whether  this  is  the  case 
for  any  distribution  of  the  costs.  Several  attempts  at  a  probabilistic  anal¬ 
ysis  essentially  based  on  the  above  considerations  have  either  argued  that 
this  is  very  likely  to  be  the  case  (see  [  3] ,  and  also  the  letter  [ 7  ]  pointing 
out  the  inadequacy  of  the  argument),  or  have  simply  assumed  it  to  be  true  [91.  . 
Needless  to  say,  this  does  not  settle  the  problem. 

While  the  theoretical  issue  remains  unresolved,  it  seems  of  considerable 
interest  to  examine  from  this  point  of  view  the  empirical  performance  of  some 
of  the  more  efficient  TSP  algorithms  on  problems  with  randomly  generated  costs. 
With  this  in  r ind,  we  have  undertaken  a  statistical  analysis  of  the  performance 
of  three  well  known  AP-based  TSP  algorithms,  as  reported  in  the  literature  by 
their  authors.  To  be  more  specific,  we  fitted  various  types  of  approximating 
functions  to  the  published  performance  data  for  each  algorithm,  in  an  attempt 


to  decide  which  type  of  function  describes  best  the  relationship  between 
problem  size  as  defined  by  n,  and  solution  time.  Our  aim  in  reporting  these 
results  is  not  to  compare  the  algorithms,  but  to  compare  the  performance  of 
the  different  approximating  functions  that  we  tested. 

2.  The  Algorithms 

The  three  algorithms  that  we  examined  are  those  of  Smith,  Srinivasan 
and  Thompson  [10],  Carpaneto  and  Toth  [4],  and  Balas  and  Christofides  [1], 
to  be  denoted  in  the  following  by  SST,  CT  and  BC,  respectively.  All  three 
are  enumerative  procedures  using  the  AP  (with  forbidden  diagonal  elements)  as 
a  relaxation  of  the  TSP.  However,  the  SST  and  CT  algorithms  use  the  AP  with 
the  original  objective  function  (1.1),  whereas  the  BC  procedure  uses  a  La- 
grangean  constructed  by  taking  into  the  objective  function,  with  appropriate 
multipliers,  selected  inequalities  of  the  type  (1.4),  or  inequalities  derived 
from  (1.2)  and  (1.4).  Whatever  the  objective  function  of  a  problem  P,  we  de¬ 
note  its  optimal  value  by  v(P). 

All  three  procedures  are  initialized  by  putting  the  TSP  on  the  list  of 
active  subproblems,  and  each  of  them  stops  when  the  list  of  active  subproblems 
is  exhausted.  The  subproblems  on  the  list  differ  from  the  original  TSP  (and 
from  each  other)  by  specified  subsets  of  arcs  that  are  forcibly  included  (I) 
into,  or  excluded  (E)  from  the  solution. 

In  each  of  the  three  algorithms,  at  a  typical  iteration  some  or  all 
of  the  following  steps  may  be  executed: 

Subproblem  Selection.  Choose  a  subproblem  TSP^  from  the  list  according 
to  some  rule. 


The  SST  algorithm  uses  the  rule  known  as  depth  first  or  LIFO  (last  in 
first  out).  This  consists  of  always  choosing  one  of  the  subproblems  (nodes 
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of  the  search  tree)  generated  at  the  last  branching  step,  in  the  order  of 
nondecreasing  lower  bound  (as  given  by  the  AF  solution  value  plus  penalties 
associated  with  each  node);  and  when  no  more  such  subproblems  exist,  back¬ 
tracking  to  the  parent  node  and  applying  the  same  rule  to  its  brothers.  This 
rule  has  the  advantage  of  modest  storage  requirements  and  easy  bookkeeping. 

Its  disadvantage  is  that  possible  erroneous  decisions  (concerning  arc  inclu¬ 
sion  or  exclusion)  made  early  in  the  procedure  cannot  be  reversed  until  late 
in  the  procedure. 

The  subproblem  selection  rule  used  by  the  CT  algorithm  is  known  as 
breadth  first.  It  consists  of  always  choosing  the  subproblem  with  the  best 
(smallest)  lower  bound.  This  rule  has  the  desirable  feature  of  keeping  the 
search  tree  as  small  as  possible,  but  on  the  other  hand  it  requires  consider¬ 
able  storage  space. 

The  BC  algorithm  uses  a  combination  of  the  depth  first  and  breadth 
first  rules:  a  successor  of  the  current  subproblem  is  selected  whenever 
available;  otherwise  a  node  with  best  (smallest)  evaluation  E  is  chosen  from 
the  list,  where  E  is  based  on  the  value  of  the  lower  bound,  corrected  for  the 
"distance"  of  the  subproblem  solution  from  a  tour. 

Having  chosen  a  subproblem  from  the  list,  the  SST  and  BC  algorithms 
go  to  Lower  Bounding,  whereas  the  CT  procedure  goes  to  Branching. 

Branching.  Break  up  the  feasible  set  of  TSP^  by  augmenting  the  sets 
1^  and  E^  of  Included  and  excluded  arcs,  respectively. 

The  SST  algorithm  chooses  a  minimum  cardinality  subtour  in  the  current 
solution  to  AP^,  whose  free  arc  set  is,  say,  £ (i^ , j^) , . . . , (i , j  ) },  and  creates 
t  successors  of  node  i,  say  11,..., it,  by  defining 


(2.1) 


Ell-SiUUi.ji)}, 

Eit  =  EiUt(it’jt)}’  Xit  =  I1Ut(11»J1> . (it-l’jt-l)}* 

The  CT  algorithm  uses  the  same  rule  (2.1),  but  chooses  for  branching 

a  subtour  whose  set  of  free  arcs  Is  of  minimum  cardinality. 

The  BC  algorithm  uses  intermittently  two  branching  rules,  one  of  which 

is  (2.1),  while  the  other  one  is  derived  from  the  principle  of  conditional 

bounds,  which  can  be  stated  as  follows.  Let  H  be  the  arc  set  of  the  current 

best  tour,  let  c^  >  0,  i,j  *  l,...,n,  be  a  set  of  reduced  costs  associated 

with  the  linear  program  (1.1),  (1.2),  (1.3'),  (1.4),  and  let  SCH,  S  = 

l(i,  >.),)>•• .»(!_» j  )}  be  such  that 
i  i  P  P 

E  c  >U-L(c), 

(i,j)«S1J 


where  U  is  the  current  upper  bound,  and  L(c)  is  the  lower  bound  associated 
with  the  cost  vector  c. 

Further,  let  Q^,  r  =  l,...,p  be  arc  sets  of  G  satisfying 


r|(i,j)«Qr  1rjr 


J.  “  Cij 


*  (i,j)e4. 


Then  every  tour  of  value  less  than  U  satisfies  the  disjunction 
-  0,  (i,j)cQ1  V...V  x  -  0,  (i,J)cQ  . 

This  disjunction  can  be  used  to  create  p  successors  of  node  i,  by 


defining 


The  rules  (2.1)  and  (2.2)  are  used  intermittently,  since  their  relative 
strengths  may  differ  at  different  nodes  of  the  search  tree.  The  choice  is 
based  on  a  coefficient  of  relative  strength. 

After  branching,  the  SST  and  BC  algorithms  place  the  newly  generated 
subproblems  on  the  list,  and  go  to  Subproblem  Selection.  The  CT  algorithm 
instead  goes  to  Lower  Bounding. 

Lower  Bounding.  Calculate  a  lower  bound  on  v(TSP^).  This  is  known  to 
be  the  crucial  ingredient  of  any  branch  and  bound  method,  in  that  the  stronger 
the  bounds  that  are  derived,  the  fewer  subproblems  have  to  be  examined. 

The  SST  algorithm  generates  the  lower  bound  L^  =  v(AP^)  by  solving  AP^ 
when  TSP^  is  selected  from  the  list.  If  L^  >  U  or  the  solution  to  AP^  is  a 
tour,  the  algorithm  returns  to  Subproblem  Selection,  otherwise  it  goes  to 
Branching.  After  branching  it  also  calculates  an  estimated  lower  bound  for 
the  new  nodes  generated,  by  adding  to  v(AP^)  the  penalty  associated  with  the 
arc  exclusions  and  inclusions  prescribed  by  the  branching  rule. 

The  CT  algorithm  solves  the  assignment  problem  for  each  node  generated 
in  the  Branching  step  as  soon  as  it  was  generated,  and  places  on  the  list 
only  those  successors  of  TSP^  for  which  the  lower  bound  obtained  is  below 
the  current  upper  bound  U.  If  any  of  the  solutions  to  the  new  assignment 
problems  is  a  tour  of  improved  value,  U  is  updated.  The  algorithm  then  re¬ 
turns  to  Subproblem  Selection. 

The  BC  algorithm  uses  an  elaborate  lower  bounding  procedure  that  goes 
well  beyond  solving  the  assignment  problem.  If  the  optimal  solution  x  to 
the  latter  defines  a  tour,  U  is  updated  and  the  algorithm  returns  to  Subprob¬ 
lem  Selection.  Otherwise,  a  Lagrangean  function  is  constructed  by  combining 
the  original  objective  function  with  a  family  of  valid  inequalities  (mostly 


facets  of  the  TSP  polyhedron)  derived  from  (1.4)  and  (1.2),  with  multipliers 
chosen  in  such  a  way  that  the  optimal  solution  x  to  the  assignment  problem 
remains  optimal  for  the  new  objective  function.  If  the  inequalities  to  be 
used  are  written  in  the  generic  form 


Z  Z  a 
ieN  jeN 


ijXij  -  3 


t 

o 


teT 


for  some  index  set  T,  then  the  Lagrangean  is 


L(w) 


min  Z  Z  (c  -  Z  w  aj  )x  +  Z  w  a* 
xeX  ieN  jeN  teT  c  teT 


where  X  is  the  assignment  poly tope  and  w  the  vector  of  multipliers.  L(w) 
provides  a  valid  lower  bound  for  any  vector  of  w  >  0.  The  strongest  bound 
is  obviously  given  by  the  Lagrangean  dual  max[L(w):  w  >  0},  which  however 
is  difficult  to  solve  in  the  given  instance.  The  BC  algorithm  replaces  the 
Lagrangean  dual  with  the  restricted  Lagrangean  problem 


(RL) 


f 

w  >  0 

] 

f  I*(w) 

'**  c.  .  if  x.  .  =  1  / 

u.+v  +  Z  w  ar  .  t 

ij  ij 

/  1  J  t«T  *  « 

<  c. .  if  x.  =  0 

V 

l-  ij  ij  ; 

and  finds  a  good  approximate  solution  to  (RL).  In  particular,  it  uses  a 
sequence  of  polynomial  time  algorithms  (bounding  procedures)  for  identifying 
new  inequalities  that  can  be  added  to  L(w)  with  a  positive  multiplier  wfc 
without  changing  the  other  components  of  w. 

When  the  sequence  of  lower  bounding  procedures  is  exhausted,  the  BC 
goes  to  Upper  Bounding. 

Upper  Bounding.  The  SST  and  CT  algorithms  do  not  use  special  upper 


bounding  procedures,  but  generate  upper  bounds  as  a  byproduct  of  solving  AP: 
whenever  the  solution  is  a  tour,  it  provides  such  a  bound. 


In  the  BC  algorithm,  the  Lagrangean  function  L(w),  besides  providing 

a  strong  lower  bound,  also  plays  the  role  of  defining  an  "admissible  subgraph" 

G  =  (N,A  ),  whose  arc  set  is 
o  ’o’ 

A0  -  {(i,j)«A|u  +v  +  S  wtaj  =  c  }. 

J  teT  J  J 

The  inclusion  of  each  new  inequality  into  L(w)  adds  at  least  one  new 
arc  to  the  set  Aq,  and  when  the  lower  bounding  procedures  are  exhausted,  Gq 
is  strongly  connected  and  without  articulation  points.  The  BC  algorithm  then 
uses  a  specialized  implicit  enumeration  procedure  with  a  cut-off  rule  in  an 
attempt  to  find  a  tour  in  Gq.  If  a  tour  H  is  found,  U  is  updated.  Furthermore, 
by  construction  Gq  has  the  property  that  if  all  inequalities  with  w£  >  0  are 
tight  for  H,  then  H  is  optimal  for  the  current  subproblem  TSP^.  In  this  case 
the  algorithm  returns  to  Subproblem  Selection;  otherwise  it  removes  from  G 
all  arcs  (i,j)  whose  reduced  costs  exceed  U-L(w)  (i.e.,  fixes  at  0  the  corre¬ 
sponding  variables)  and  goes  to  Branching. 

The  main  characteristics  of  the  three  algorithms  as  implemented  by 
their  authors  are  summarized  in  Table  1. 

3.  The  Data 

Performance  data  for  solving  the  asymmetric  TSP  are  reported  in  [10], 

[  4]  and  [1],  for  FORTRAN  implementations  of  the  SST,  CT  and  BC  algorithms, 
respectively  (see  also  [2]),  and  are  reproduced  for  convenience  in  Table  2. 

Each  of  the  three  codes  was  run  on  (different)  sets  of  asymmetric  TSP's  whose 
costs  were  drawn  independently  from  the  discrete  uniform  distribution  on 

{i:  i  =  1,2 . 1000}.  The  data  consist  of  the  arithmetic  average  computing 

time  (in  seconds)  and  arithmetic  average  number  of  nodes  of  the  search  tree. 

This  latter  number  refers  to  all  nodes  generated  in  the  case  of  the  CT  and 


Relaxation 


Branching 

rules 


Subproblem 

selection 


SST 

CT 

AP  with 

TSP  objective 

AP  with 

TSP  objective 

v (AP^ ) ,  ob  tained 

by  parametric 
simplex  method, 
plus  penalties 

v (AP^ ) ,  ob  tained 

by  Hungarian 
method  (post- 
optimizing 
version) 

(2.1) 

(2.1) 

depth  first 

breadth  first 

no  special 
procedure 

no  special 
procedure 

no 

no 

I 

AP  with 
Lagrangian 
obj ective 


lower  bound 
on  v(RL) , 
obtained  by 
polynomial- time 
approximation 
procedures 


(2.1)  and  (2.2) 


depth  first  upon 
forward  step, 
breadth  first 
upon  backtracking 


tour- finding 
heuristic 
applied  to 
admissible 
graph 


Nodes  of  the  search  tree 


Computing  time  (seconds) 


n 

SST(1) 

ct(2) 

bc(2) 

SST^ 

ct<4> 

bc(5) 

40 

26 

27 

- 

2.9 

0.9 

50 

11 

- 

12 

1.7 

- 

0.2 

60 

39 

24 

- 

9.3 

2.2 

- 

70 

32 

- 

- 

8.5 

- 

- 

75 

- 

- 

27 

- 

0.3 

80 

32 

42 

- 

13.8 

6.6 

- 

90 

82 

- 

- 

42.0 

- 

- 

100 

87 

56 

39 

53.0 

10.4 

0.7 

110 

24 

- 

- 

22.3 

- 

- 

120 

65 

61 

- 

62.9 

16.2 

- 

125 

- 

- 

43 

- 

- 

1.1 

130 

97 

- 

- 

110.1 

- 

- 

140 

130 

57 

- 

165.2 

18.7 

- 

150 

50 

- 

46 

65.3 

- 

2.0 

160 

70 

73 

- 

108.5 

32.8 

- 

170 

98 

- 

- 

169.8 

- 

- 

175 

- 

- 

58 

- 

- 

4.2 

180 

215 

69 

- 

441.4 

28.8 

- 

200 

- 

58 

63 

- 

35.7 

6.1. 

220 

- 

43 

- 

- 

46.7 

- 

225 

- 

- 

84 

- 

- 

10.4 

240 

- 

63 

- 

- 

53.4 

- 

250 

- 

- 

89 

- 

- 

13.7 

275 

- 

- 

106 

- 

- 

21.7 

300 

- 

- 

124 

- 

- 

38.4 

325 

- 

- 

142 

- 

- 

49.7 

BC  algorithms,  but  only  to  the  nodes  selected  and  explored  In  the  case  of 
SST.  The  averages  are  based  on  5,  20,  and  10  replications  for  each  problem 
size  for  SST,  CT,  and  BC,  respectively.  The  SST  algorithm  solved  problems 
of  sizes  40,  50,..., 180;  CT  solved  problems  of  sizes  40,  60,..., 240;  and 
BC  solved  sizes  50,  75,..., 325.  The  computations  were  performed  on  a  UNIVAC 
1108,  CDC  6600,  and  CDC  7600  for  the  SST,  CT,  and  BC  algorithms,  respectively. 
We  note  that  the  CDC  7600  is  approximately  three  times  faster  than  the  CDC 
6600  or  the  UNIVAC  1108  (which  are  of  roughly  equal  speed) . 

The  natural  logarithms  of  the  average  solution  times  for  the  three 
algorithms  are  plotted  against  the  natural  logarithm  of  problem  size  in 
Figure  1.  A  straight  line  relationship  would  suggest  that  solution  time  is 
a  polynomial  function  of  problem  size.  The  SST  curve  is  the  least  smooth  of 
the  three,  undoubtedly  because  each  point  is  an  average  of  only  five  trials. 
The  BC  points  exhibit  the  least  variance  around  a  smooth  function,  although 
they  are  averages  of  only  ten  trials  whereas  the  CT  points  are  averages  of 
twenty  trials. 

We  also  have  solution  time  data  for  each  of  the  twenty  trials  for  each 
of  the  eleven  problem  sizes  for  the  CT  algorithm.  Table  3  presents  some 
relevant  summary  statistics  for  these  data.  me<*i-ans  are  peculiarly  in¬ 

sensitive  to  problem  size  for  160  <  n  <  240.  Hence  the  increases  in  the  arith 
metlc  and  geometric  means  for  those  problem  sizes  are  due  to  the  right  tail 
of  the  distribution,  which  is  corroborated  by  the  75th  percentile  solution 
times.  Overall,  the  various  summary  statistics  for  the  distribution  of  solu¬ 
tion  times  are  highly  correlated  (see  Table  4).  The  standard  deviations  of 
the  solution  times  increase  strongly  with  problem  size;  even  the  standard 


Table  4:  Simple  correlations  between  pairs  of  summary  statistics 
for  solution  times  for  the  CT  algorithm 


(logftj  ) 


log[t^] 


Log  of 

Log  of 

Arithmetic 

arithmetic 

Geometric 

geometric 

mean 

mean 

mean 

mean 

<V 

(log[til ) 

(log[ t^] ) 

i  0.886 

.986 

0.905 

.875 

.998 

0.904 

.896 

.863 

.941 

0.872 

.984 

.901 

.999 

.899 

.751 

.775 

.685 

.734 

Median 


(t‘5) 


75th 

percentile 

(t;75) 


0.948 

.634 


0.689 
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deviations  of  the  natural  logarithms  of  solution  times,  s*°8,  increase  with 


problem  size,  with  a  correlation  of  0.78. 


4.  The  Models 


To  represent  computing  time  as  a  function  of  problem  size,  we  inves¬ 
tigated  models  of  the  form 

(4.1)  t±  =  ffa^.QVe*1^, 

where  n^  is  problem  size,  t^  is  the  solution  time  for  the  j-th  trial  for 
problem  size  n^,  0  is  a  vector  of  unknown  parameters,  and  is  an  error 
factor  with  mean  EC®^)  equal  to  zero  and  variance  E(e^)  equal  to  ct^,  where 
E  is  the  expectation  operator.  Taking  natural  logarithms  of  both  sides  of 
(4.1)  yields 


(4.2) 


log  t  =  log  f(ni;e)+«1 


and  denoting  by  N  and  M  the  number  of  problem  sizes  and  trials,  respectively, 
the  problem  of  minimizing  the  sum  of  squared  errors  (known  as  the  least 
squares  problem)  is 


N  M 


(4.3) 


min  £  E  [log  t  -log  f(n  ;0)]  . 
0  i«l  j*l  1J  1 


At  least  for  functions  f(ni;9)  which  are  linear  in  0,  least  squares  estima¬ 
tors  are  efficient  (minimum  variance)  if  the  errors  are  homoscedastlc  (equal 
variance)  and  uncorrelated  [EU^c^)  -  0  unless  i  *  h  and  j  -  k] .  The 
multiplicative  model  (4.1)  or  (4.2)  eliminates  the  worst  of  the  correlation 
between  the  error  variance  and  n^,  although  even  in  this  model  some  correla¬ 
tion  remains  for  the  Carpaneto-Toth  algorithm  (see  Table  3). 


We  shall  focus  on  three  approximating  functions:  polynomial,  super¬ 


polynomial  (or  log-exponential),  and  exponential,  defined  by 

(4.4)  f(nt)  “  a  ^  or  log  f(n^)  =  a  +  Plogn^ 

(4.5)  f(ni)  =  ck  ,n^logni  or  log  f(n^)  =  a  +  ^log2n1> 

and 

(4.6)  f(n^)  =  a'e^ni  or  log  f(n^)  =  a  +  3^ 

respectively,  where  a  *  log  a '  and,  as  before,  log  denotes  the  natural 
logarithm. 

Since  we  have  only  arithmetic  average  solution  times  for  each  problem 
size  for  the  SST  and  BC  algorithms,  we  cannot  solve  the  least  squares  prob¬ 
lem  (4.3).  Averaging  (4.2)  over  j  produces 

(4.7)  log  t^  =  log  f(n1;0)+*t, 

where  t^  is  the  geometric  mean  of  the  t^'s  (or,  equivalently,  log  t^  is  the 
arithmetic  mean  of  the  log  t^'s),  and  is  the  arithmetic  mean  of  the  c^'s, 
over  all  j.  But  since  we  cannot  compute  t^  for  the  SST  and  BC  algorithms, 
we  replace  t^  with  the  arithmetic  mean  t^  for  all  three  algorithms,  and  actu¬ 
ally  solve  the  least  squares  problem 

N  -  2 

(4.8)  min  E  [log  t, -log  f (n  ; ©>]  . 

9  i=l  1  1 

Since  the  correlation  between  t^  and  t^  is  0.986  and  the  correlation  between 
log(t^)  and  log(tp  is  0.998  for  the  CT  algorithm,  this  approximation  should 
not  affect  our  results  importantly. 


Discriminating  among  models  is  difficult  over  the  range  of  problem 
sizes  spanned  by  our  data,  especially  with  the  small  number  of  observations 
available.  Table  5  displays  the  simple  correlations  between  the  predicted 
solution  times  for  the  sample  problem  sizes  for  each  pair  of  models  (poly¬ 
nomial,  superpolynomial,  exponential)  for  each  of  the  three  algorithms  (SST, 

CT,  BC).  The  smallest  correlation  is  0.968  between  the  polynomial  and  expon¬ 
ential  models  for  the  BC  algorithm.  These  high  correlations  suggest  why  it 
is  difficult  to  estimate  which  function  best  describes  the  relationship  be¬ 
tween  problem  size  and  solution  time  for  each  algorithm. 

5 .  The  Results 

We  fitted  the  solution- time  data  for  each  of  the  three  algorithms  to 

each  of  the  three  models.  The  results  are  as  shown  in  Table  6. 

Each  of  the  models  is  linear  in  the  parameters  a  and  8  (see  (4.4)- (4.6)) 

and  is  of  the  general  form  log  t^  =  where  x^  *  n^  for  the  exponen- 

2 

tial  model,  =  log  n^  for  the  polynomial  model,  and  x^  =  log  n^  for  the 
superpolynomial  model,  and  where  n^  denotes  the  number  of  cities  as  a  measure 
of  problem  size,  while  log  denotes  the  natural  logarithm.  The  symbols  &  and 

A 

8  stand  for  the  least  squares  estimates  of  a  and  8»  respectively,  while  s- 
and  s*  denote  the  .standard  errors  (or  square  roots  of  the  estimates  of  the 

A  A 

variances)  of  a  and  8,  respectively.  The  smaller  the  standard  error  of  an 
estimated  parameter,  the  more  precise  is  our  estimate  of  that  parameter. 

S  is  the  standard  error  of  the  regression,  defined  as 
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Table  6:  Least  squares  estimates  of  the  model 
parameters,  and  associated  statistics 


SST 


CT 


BC 


A 

a 

A 

e 

S 

R2 

DW 

(sA) 

a 

(8p) 

(Signif) 

Polynomial 

-11.39 

(1.37) 

3.243 

(0.296) 

0.5132 

0.895 

2.29 

Superpolynomial 

-4.187 

(0.716) 

0.361 

(0.0328) 

0.5108 

0.896 

2.26 

Exponential 

-0.0823 

(0.406) 

0.0331 

(0.00343) 

0.5745 

0.868 

1.74 

Polynomial 

-8.232 

(0.531) 

2.256 

(0.110) 

0.1980 

0.977 

1.34 

Superpolynomial 

-3.307 

(0.373) 

0.241 

(0.0155) 

0.2598 

0.960 

0.93 

(0.05) 

Exponential 

-0.789 

(0.369) 

0.0140 

(0.00215) 

0.3462 

0.929 

1.52 

Polynomial 

-14.51 

(0.953) 

3.114 

(0.186) 

0.3607 

0.962 

0.83 

(0.05) 

Super polynomial 

-7.046 

(0.367) 

0.320 

(0.0136) 

0.2600 

0.980 

1.05 

Exponential 

-2.470 

(0.137) 

0.0205 

(0.00066) 

0.1985 

0.989 

0.96 

(0.05) 

Notes: 

A  A 

a>@  3  least  squares  estimates  of 

A  A 

s»,s*  *  standard  error  of  a, 3 


v 

'Ms 

s? 


R2 


*  standard  error  of  regression 

*  squared  (adjusted)  coefficient  of  multiple  correlation 
DW  *  Durbln-Watson  statistic 

Signif  *  level  of  significance 


J 


where  p  is  the  dimensionality  of  0  (here  p  *  2,  since  8  =  (a,@),  and 


=  log  t^-f(n^;0).  S  is  a  measure  of  the  average  distance  of  the  data 

points  from  the  regression  line. 

—2 

R  is  the  (adjusted)  squared  multiple  correlation  coefficient  (also 
called  coefficient  of  (multiple)  determination),  defined  as 


(5.2) 

where 


R2  =  1-(S2/S2), 


-2  1  N  -  2 
Sy  =  FT  }  , 


and  where  denotes  log  t^,  while  y  is  the  arithmetic  mean  of  the  y^s.  R2 
is  the  fraction  of  the  variance  of  the  log  t^'s  explained  by  the  regression. 

The  closer  to  1  is  the  coefficient  of  determination,  and  the  closer  to  0  is  the 
standard  error  of  the  regression  for  a  certain  model,  the  better  that  model 
fits  the  data. 

DW  stands  for  the  Durbin-Watson  statistic,  defined  as 


(5.3) 


DW  = 


N 

i  <V\  iy 

i=2  1  11 

N  o 

2  i 

i=l  1 


It  is  a  measure  of  the  association  of  "adjacent"  residuals.  If  the  residuals 
are  uncorrelated,  DW  ■  2;  if  they  are  perfectly  positively  correlated,  DW  *  0; 
and  if  they  are  perfectly  negatively  correlated,  DW  -  4.  In  our  context,  a 
Durbin-Watson  statistic  significantly  less  than  2  for  a  certain  model  in¬ 
dicates  that  the  model  systematically  overestimates  and  underestimates  ranges 
of  the  curve  described  by  the  data,  whereas  a  DW  statistic  significantly 
greater  than  2  indicates  frequent  alternations  of  over-  and  under-estimates. 

For  further  details  on  the  statistical  concepts  used  here  and  in  the 
rest  of  the  paper,  the  reader  is  referred  to  t  8 ]  and  [11]. 


Based  on  the  standard  error  of  the  regression  and  the  coefficient  of 
determination,  none  of  the  three  models  fits  the  data  of  the  SST  algorithm 
very  well,  although  the  polynomial  and  superpolynomial  models  fit  them  slightly 
better  than  the  exponential  one.  The  performance  of  the  CT  algorithm  is  best 
described  by  the  polynomial  function,  with  the  superpolynomial  a  close  second, 
and  the  exponential  third.  The  exponential  function  fits  best  the  BC  data, 
with  the  superpolynomial  a  close  second  and  the  polynomial  third. 

Although  the  exponential  function  fits  the  BC  data  better  than  the 
other  two  functions,  its  DW  statistic  is  nevertheless  significant  at  the  5% 
level,  i.e.,  even  this  "best"  function  systematically  fails  to  capture  the 
curvature  of  the  "true"  function  behind  the  data  (the  57,  significance  level 
means  that  the  chances  of  getting  a  value  as  different  from  2  as  0.96  if 
in  fact  the  errors  are  uncorrelated,  are  less  than  one  in  20).  The  JXf  sta¬ 
tistics  for  the  other  "best"  models  do  not  significantly  differ  from  2. 

More  significant  than  the  differences  in  model  rankings  for  the  three 
algorithms  is  the  relative  closeness  of  the  fit  for  all  three  models,  for 
each  of  the  algorithms.  Although  the  CT  algorithm  is  best  described  by  the 
polynomial  model  and  the  BC  algorithm  by  the  exponential  model,  note  that 
both  the  exponential  model  for  the  CT  algorithm  and  the  polynomial  model 
for  the  BC  algorithm  fit  the  data  considerably  better  than  any  of  the  models 
for  the  SST  algorithm. 

Since  we  are  particularly  interested  in  the  asymptotic  behavior  of 
these  algorithms,  it  is  important  that  there  not  be  any  systematic  discrep¬ 
ancies  between  the  actual  data  points  and  their  estimated  values  for  the 

larger  problem  sizes.  For  example,  the  model  =  a  +  0xt  +  will  fit  data 

2 

generated  by  y ^  ■  x  very  well  over  broad  ranges  of  x. ;  but  the  model  will  tend 


to  underestimate  y^  for  small  and  large  values  of  x^  and  overestimate  y^  for 
intermediate  values  of  x^.  One  clue  in  this  case  would  be  a  DW  statistic 
near  zero.  However,  one  cannot  count  on  the  DW  statistic  in  a  more  compli¬ 
cated  situation. 

There  is  a  simple  procedure  for  testing  whether  the  data  for  small  and 

large  problem  sizes  can  be  described  adequately  by  the  same  model.  Partition 

the  data  into  two  subsets:  1^  =  {1,2, . . . , [j]  },  =  fN  +  1  -  [  —]  ,  N  +  2  - 

N  t  N  N 

[— ] ,  ...»  N},  where  [j]  is  the  greatest  integer  in  j.  Estimate  the  model  for 

2 

each  subset  of  the  data  by  least  squares  and  let  be  the  sum  of  squared 

A.k 

residuals  (denoted  e^)  for  the  model  associated  with  1^: 

(5.4)  S2  =  E  (eV. 

i'1* 

2 

Let  1=1,  U  I.  and  define  S  =  E,  T«4 •  (Note  that  for  N  odd  the  set  I  excludes 
i  2.  iei  1 

the  middle  observation.)  Then  the  statistic 


(5.5) 


S2  -  (S2  +  S22)  2[|] 

2  2  *  i 

Sl  +  S2 


has  the  F  (Fisher's)  distribution  with  2  degrees  of  freedom  in  the  numerator 
N 

and  2[j]  -  4  degrees  of  freedom  in  the  denominator.  The  larger  this  statistic 
is,  the  less  likely  it  is  that  the  data  for  1^  and  can  be  described  by  the 
same  model. 

The  legitimacy  of  this  test  depends,  inter  alia,  on  the  homoscedasticity 


of  the  errors  e^.  If  we  hypothesize  that  var(e^)  =  a^,  i  e  1^,  then  the 


statistic 


2  2 

can  be  used  to  test  the  null  hypothesis  H^:  against  the  alternative 

2  2  N 

hypothesis  H  :  o.  f  a0.  The  statistic  tu  has  the  F  distribution  with  [— ]  - 

degrees  of  freedom  in  the  numerator  and  denominator.  The  null  hypothesis  is 

rejected  if  tu  is  sufficiently  different  from  1  (the  synmetry  occurs  because 

2  2  *>  2 

if  Hq  is  false,  either  or  a“  >  a^).  These  tests  were  performed  with 

the  results  shown  in  Table  7. 


Table  7.  Tests  for  homoscedasticitv  (uu)  and  model  stabilit 


CT 


-"■>~^Alg  o  r  i  thm 
Model  "" 


Polynomial 


Exponential 


s 

ST 

(U 

u 

1.036 

0.145 

0.959 

0.530 

0.815 

3.816 

(0.1) 

8.277  14.436 

(0.1)  (0.005) 


(0.01) 

25.611 

(0.005) 


6.499 

(0.1) 


7.674 

(0.025) 


1.926  13.914 

(0.005) 


*  The  numbers  in  parenthesis  represent  significance  levels. 

Only  the  polynomial  and  superpolynomial  models  for  BC  fail  the  test  for 

homoscedasticity,  which  they  fail  only  at  the  10  percent  significance  level. 

To  perform  the  stability  tests  for  these  two  models  we  multiplied  the  first 
six  observations  by  thereby  insuring  the  equality  of  the  residual 

variances  for  the  two  regressions  based  on  1^  and  I^. 

Only  the  polynomial  and  superpolynomial  models  for  SST,  which  are  the 
better-fitting  models  for  that  algorithm,  pass  the  model  stability  test. 
Stability  of  all  the  models  for  BC  and  CT  is  rejected.  The  problem  is  either 
that  the  asymptotic  behavior  of  the  BC  and  CT  algorithms  is  not  well  described 
by  any  of  the  models,  or  that  low-order  effects  confound  our  results  because 
the  samples  do  not  include  enough  sufficiently  large  problem  sizes. 


Because  the  models  generally  fall  the  stability  tests ,  the  results  for 
larger  problem  sizes  may  be  better  predictors  of  asymptotic  behavior  than  are 
the  results  for  all  observations.  Tables  8,  9,  and  10  present  the  estimates 
for  SST,  CT,  and  BC,  respectively,  for  the  three  models  based  on  data  for  the 
smaller  half  and  larger  half  of  the  problem  sizes.  These  Tables  dramatize 
why  the  models  for  BC  and  CT  do  not  survive  the  stability  tests.  While  the 
exponential  model  remains  best  for  each  half  of  the  BC  data,  the  exponent 
declines  about  thirty  percent,  from  0.0245  to  0.0170  (and  the  constant  term 
increases  from  -2.913  to  -1.568)  from  the  first  to  the  second  half.  These 
changes  are  far  greater  than  could  be  expected  from  sampling  error  (as  pre¬ 
dicted  by  the  standard  errors  of  a  and  g),  which  is  why  the  model  stability 
test  fails.  The  two  functions  predict  equal  solution  times  for  problems 
roughly  of  size  179,  which  lies  between  the  subsets  of  smaller  and  larger 
problem  sizes  on  which  the  estimates  are  based.  Overall,  there  is  little 
difference  in  the  within-sample  predictive  ability  of  the  three  models 
(as  judged  by  S). 

The  polynomial  model  remains  best  for  smaller  problem  sizes  for  the 
Carpaneto-Toth  algorithm,  with  the  superpolynomial  model  a  close  contender. 

For  the  larger  problem  sizes,  the  exponential  model  is  best,  although  there 
is  little  to  choose  among  the  three  models.  For  all  three  models,  the  coef¬ 
ficient  of  the  problem  size  explanatory  variable  decreases  from  the  first 
(smaller  problems)  to  the  second  (larger  problems)  regression.  That  is,  the 
logarithms  of  solution  times  for  smaller  problems  are  more  sensitive  to  problem 
size  than  are  the  logs  of  solution  times  for  larger  problems. 

For  the  SST  algorithm,  the  exponential  model,  which  fits  the  entire  data 
set  worst,  fits  the  two  subsets  best,  although  the  differences  among  models 
are  not  great,  especially  for  the  larger  problem  sizes.  Again,  the  sensitivity 
of  log  solution  times  to  problem  size  declines  for  all  three  models,  dramatically 
for  the  exponential. 
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Table  8.  Least 

squares  estimates  for 

the  SST  a 

leorithm. 

A 

a 

<s.) 

a 

s 

<v 

S 

R2 

DW 

(Signif ) 

4°  <  a  <  100 

Polynomial 

-12.71 

(2.77) 

3.578 

(0.657) 

0.5276 

0.827 

2.72 

Superpolynomia 1 

-5.377 

(1.358) 

0.434 

(0.0757) 

0.5047 

0.841 

2.86 

Exponential 

-1.553 

(0.626) 

0.0554 

(0.00860) 

0.4550 

0.871 

3.22 

120  <  n  <  180 

Polynomial 

-11.36 

(7.26) 

3.243 

(1.451) 

0.5183 

0.400 

1.63 

Superpolynomial 

-3.312 

(3.625) 

0.327 

(0.145) 

0.5155 

0.406 

1.64 

Exponential 

1.493 

(1.442) 

0.0225 

(0.00953) 

0.5041 

0.432 

1.68 

Notes :  see  Table  6. 


A 

a 

<sO 

ot 

9 

<s9) 

S 

R2 

DW 

(Signif ) 

40  <  n  <  120 

Polynomial 

-10.07 

(0.661) 

2.693 

(0.153) 

0.1325 

0.987 

3.10 

Superpolynomial 

-4.408 

(0.365) 

0.318 

(0.0192) 

0.1409 

0.986 

2.59 

Exponential 

-1.370 

(0.345) 

0.0364 

(0.00407) 

0.2574 

0.952 

1.55 

160  <  n  <  240 

Polynomial 

-3.791 

(2.293) 

1.407 

(0.434) 

0.1389 

0.704 

1.93 

Superpolynomial 

-0.0992 

(1.131) 

0.134 

(0.0404) 

0.1365 

0.715 

1.95 

Exponential 

2.189 

(0.404) 

0.00730 

(0.00200) 

0.1265 

0.755 

2.03 

Notes:  see  Table  6 


a 

ot 

(s.) 

B 

<*$> 

S 

K 

DW 

(Signif ) 

5°  <  n  <  175 

Polynomial 

-11.33 

(1.29) 

2.407 

(0.277) 

0.2879 

0.937 

1.72 

Superpolynomial 

-5.954 

(0.563) 

0.267 

(0.0255) 

0.2415 

0.956 

1.89 

Exponential 

-2.913 

(0.138) 

0.0245 

(0.00115) 

0.1203 

0.989 

3.15 

200  <  n  <  325 

Polynomial 

-21.43 

(1.37) 

4.379 

(0.247) 

0.1001 

0.984 

2.10 

Superpolynomial 

-9.319 

(0.652) 

0.395 

(0.0211) 

0.0947 

0.986 

2.30 

Exponential 

-1.568 

(0.221) 

0.0170 

(0.00083) 

0.0867 

0.988 

2.95 

Notes:  see  Table  6 


In  summary,  these  statistics  do  not  offer  much  basis  for  deciding  which 
of  the  three  models  describes  best  the  performance  of  the  three  TSP  algorithms 
They  nevertheless  convey  the  important  information  that  the  best  fitting  poly¬ 
nomial  functions  have  exponent  around  3  (for  the  polynomial  model,  2.2  <  P  < 

3.2),  while  the  best  fitting  exponential  functions  have  base  around  1.02  (for 

0.014  0  033 

the  exponential  model,  the  base  b  satisfies  e  <  b  <  e  *  ).  All  three 

approximating  functions  for  all  three  algorithms  are  shown  in  Table  11. 

a 

Table  11.  Best  ADoroximatina  Functions 


“'-vAlgor  i  thm 

SST 

CT 

BC 

Function  typfe's^ 

40  <  n  <  180 

40  <  n  <  240 

50  <  n  <  325 

Polynomial 

1.13xl0_5xn3*243 

0.27Xl0_3Xn2'256 

0.5xl0"6xn3'114 

Super- 

O.lSxlO^Xn0,361108® 

0.37  Xl0_1Xn° ’ 2411°8n 

0.87xl0"3xn0,3201°gn 

polynomial 

Exponential 

0.92xe0‘0331n 

0.45xe°*014a 

O.SJxlO^e0-0205- 
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This  paper  reports  the  results  of  a  statistical  analysis  of  the  per¬ 
formance  of  three  branch  and  bound  algorithms  for  the  general  (asymmetric) 
traveling  salesman  problem  on  randomly  generated  test  problems  with  up  to 
325  cities.  Three  types  of  functions,  polynomial,  superpolynomial  (log- 
exponential)  and  exponential,  were  fitted  to  the  performance  data  of  each 
of  the  algorithms  by  least  squares  techniques.  The  three  functions  describe 
almost  equally  well  the  behavior  of  the  algorithms  in  the  range  of  problem 
sizes  examined.  ~ 
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